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Abstract 

The  asymptotic  distribution  of  an  M-estimator  is  studied  when  the  under¬ 
lying  distribution  is  discrete.  AsymDtotic  normality  is  shown  to  hold  quite 
generally  within  the  assumed  parametric  family.  When  the  specification  of  the 
model  is  inexact,  however,  it  is  demonstrated  that  an  M-estimator  with  a  non¬ 
smooth  score  function,  e.g.  a  Huber  estimator,  has  a  non-normal  limiting  dis¬ 
tribution  at  certain  distributions,  resulting  in  unstable  inference  in  the 
neighborhood  of  such  distributions.  Consequently,  smooth  score  functions  are 
proposed  for  discrete  data. 


AMS  1970  Subject  Classification:  62E20,  62F10,  62G35. 
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1.  Introduction 


i 

M-estimation,  originally  proposed  by  Huber  (1964)  to  estimate  a  location  | 

parameter  robustly,  has  since  been  applied  successfully  to  a  variety  of  esti-  ! 

mation  problems  where  stability  of  the  estimates  is  a  concern.  There  is,  for  ' 

I 

instance,  a  substantial  body  of  literature  on  M-estimation  for  regression  j 

models;  see  Krasker  and  Welsch  (1982)  for  a  recent  review.  For  further  re¬ 
ferences  on  M-estimation,  see  Huber  (1981). 

Much  of  the  popularity  of  M-estimators  can  be  attributed  to  their  flexi¬ 
bility.  Desired  properties  of  an  M-estimator,  such  as  relative  insensitivity 
to  or  rejection  of  extremely  outlying  data  points,  can  be  specified  in  a 
direct  way  since  the  influence  function  of  an  M-estimator  is  proportional  to 
its  score  function;  see  Hampel  (1974)  or  Huber  (1981)  for  details. 

Surprisingly,  M-estimation  for  discrete  data  seems  to  have  received  little 
attention.  Discrete  data  are  no  less  prone  than  continuous  measurements  to 
outliers  or  partial  deviations  from  an  otherwise  reasonable  model;  see,  for 
instance,  data  from  mutation  research  presented  in  Simpson  (1985).  This 
paper  investigates  some  aspects  of  M-estimation  for  discrete  data. 

A  useful  optimality  theory  has  been  developed  by  Hampel  (1968,  1974)  for 
robust  M-estimation  of  a  univariate  parameter.  His  general  prescription  fa¬ 
cilitates  the  construction  of  robust  M-estimators  with  nearly  optimum  effi¬ 
ciency  at  a  specified  model.  Proposals  for  robust  estimation  of  the  binomial 
and  Poisson  parameters,  for  instance,  can  be  found  in  Hampel  (1963).  Hampel's 
univariate  theory  is  briefly  reviewed  in  Section  2.  Extensions  of  this  opti¬ 
mality  theory  to  certain  multivariate  models  are  discussed  in  Krasker  (1980), 

Krasker  and  Welsch  (1982),  Ruppert  (1985),  and  Stefanski,  Carroll,  and  Ruppert 


The  score  function  for  Hampel's  optimal  M-estimator  is  not  smooth,  that 
is,  it  is  not  everywhere  differentiable.  This  can  lead  to  complications  in 
the  asymptotic  theory  when  the  data  are  discrete.  For  instance,  Huber  (1981, 
n.  51)  considers  the  case  where  the  underlying  distribution  is  a  mixture  of 
a  smooth  distribution  and  a  point  mass.  He  observes  that  if  the  point  mass 
is  at  a  discontinuity  of  the  derivative  of  the  score  function,  then  an  M- 
estimate  for  location  has  a  non-normal  limiting  distribution.  Along  the  same 
lines,  Hampel  (1968,  p.  97)  notes  that  the  optimal  M-estimate  for  the  Poisson 
parameter  is  asymptotically  normal  at  the  Poisson  distribution,  provided  the 
truncation  points  of  the  score  function  are  not  integers.  He  conjectures 
that  "under  any  Poisson  distribution,  it  is  asymptotical ly  normal  (with  the 
usual  variance);  however,  this  remains  to  be  seen." 

This  paper  provides  extensions  to  the  asymptotic  distribution  theory  of 
M-estimators  especially  relevant  to  discrete  data,  although  Theorem  1  is 
somewhat  broader  in  scope'.  The  main  results  are  given  in  Section  3.  Among 
the  applications  of  the  theory  are  a  more  complete  account  of  the  asymptotics 
of  the  Huber  M-estimate  for  location  and  a  proof  of  Hampel's  conjecture. 

Aside  from  providing  a  more  complete  asymptotic  theory  for  M-estimation,  the 
results  have  implications  for  choosing  a  score  function  when  the  data  are  dis¬ 
crete.  These  are  discussed  in  the  final  sections.  In  particular,  smooth 
score  functions  are  proposed. 

2.  Parametric  M-estimation:  Definitions,  optimality  and  examples 

Suppose  X^X^,...  are  independent  observations,  each  thought  to  have  dis¬ 
tribution  function  (d.f.)  F  ,  where  0  belongs  to  a  parameter  set  0;  here  0  is 
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a  subset  of  R^t  d>l.  Define 

(2.1)  M(t;ip,F)  =  f\p(m  ,t)dF , 

where  F  is  a  d.f.  on  r\  ^(*,*)  is  a  measurable  real -valued  function  on 
x  0,  and  teQ.  Then  Tp  is  an  M-estimator  for  0,  based  on  a  sample  of  size 
n,  if  it  solves  an  equation  of  the  form 

(2.2)  M(Tn;,^Fn)  =  °* 

where  is  the  empirical  d.f..  The  standard  requirement 

(2.3)  M(0 ;i|sF0)  =  0,  0  £  G, 

and  additional  regularity  conditions  ensure  that  Tp  consistently  estimates 
0  when  the  model  is  correct. 

Supoose  now  that  QcR1.  The  influence  function  at  FQ  of  an  M-estimator 
for  0  has  the  form 


n(x,0)  =  — ^ 


(x,0) 


-/(>(«,  0)JdFe 


provided  this  exists.  Assume  FQ  has  a  density  fQ  with  respect  to  a  suitable 
measure,  and  assume  the  parametrization  is  smooth.  Letting  £(x,0)  =  ^-log  f0(x). 


the  optimal  score  according  to  Hampel's  criterion  has  the  form 


(2.4)  ,c(0)U(x,0)-a(e)), 

where 


r  u,  |uj  <  c 

(  c  sign  (u) ,  |u j  >  c. 


and  a  is  defined  implicitly  by  (2.3).  This  estimator  cannot  be  dominated 
by  any  M-estimator  simultaneously  with  respect  to  the  asymptotic  variance  and 


the  bound  on  the  influence  function  at  F. .  This  is  assuming,  of  course,  that 

y 

the  estimator  is  asymototically  normal  at  FQ. 

y 

The  truncation  point  c(0)  determines  the  bounds  on  fi(*,9)  and  hence  the 
robustness  of  the  estimator  to  outlying  data  points.  Observe  that  the  maximum 
likelihood  estimator  has  the  form  (2.4)  with  c(0)=°°  and  a(0)=O. 

Two  examoles  given  in  Hampel  (1963)  will  be  of  special  interest  here. 

Example  1.  If  F  is  the  normal  d.f.  with  mean  0  and  unit  variance  then 
- -  0 

£(x,0)=x-0.  By  symmetry  a(0)  =  O,  and  constant  variance  suggests  setting 
c(0)=c.  The  resulting  estimator,  with  score  i|»c(x-0),  is  the  Huber  (1964) 
M-estimator  for  location. 

Example  2.  If  F„  is  the  Poisson  d.f.,  with  density  f„ (x)  =  e"90X/x!  on 

y  y 

x  =  0,1 ,2, . . . ,  then  £(x,0)  =  x0_1  -  1 .  Hampel  (1968,  o.  96)  suggests  taking 

_i/  _i/ 

c(9)=c9  1  on  the  grounds  that  £(x, 9)  has  standard  deviation  9~  .  For 

-  V  V 

this  choice  (2.4)  is  equivalent  to  (x9~  2  -  9  2  -  a(0)) .  The  version 

c 

(2.5)  <f-c(x0"V2-B(9)), 

i/ 

where  8(0)  = 9  2 +a(0)  is  defined  by  (2.3),  is  slightly  more  convenient. 

3.  Extended  asymptotic  distribution  theory 

Conditions  for  consistency  of  an  M-estimator  can  be  found  in  Huber  (1964, 

|  1967,  1981).  Since  the  smoothness  plays  no  role  in  the  consistency  proofs, 

consistency  will  usually  be  assumed  here. 

Huber  (1981,  theorems  3.2.4  and  6.3.1)  shows  under  quite  general  condi- 

1 

j  tions  that  if  Tn-**9  =  T(G)  in  probability  as  n-*-®  then 

1 

I 

t 

> 


(3.1) 


-n'2M(T  ;*,G)  =  n  '2  £  ^(x.,9)  +  0(1), 
n  i=l  1  D 


where  M  is  given  by  (2.1).  In  particular,  i p  need  not  be  differentiable;  mono¬ 
tonicity  or  Lipschitz  continuity  conditions  are  sufficient.  That  Tn  is  asymo- 
totically  normal  follows  immediately  from  (3.1)  provided  M(t;’jj,G)  has  a  non- 
zero  derivative  at  9  and  0  <  f\p  (•,9)dG<°°;  see  Corollary  6.3.2  of  Huber  (1981) 
For  stronger  almost  sure  representations  for  Tn  under  stronger  conditions,  see 
Carroll  (1978a,  1978b). 

To  avoid  Lipschitz  conditions  for  score  functions  like  (2.5)  that  have  im¬ 
plicitly  defined  centering  oarameters,  the  following  lemma  is  useful.  The 
proof  is  contained  in  the  proof  of  Theorem  2.2  of  Boos  and  Serfling  (1980). 


Denote  by 


the  total  variation  norm,  given  by 


rv 

L  =  lirn  sup  l  | h (x - )  -  h(x .  ,) 
v  .=1  i 


where  the  supremum  is  over  partitions  a =  xQ <  <  . . .  < =  b  of  [a,b],  and  the 

limit  is  as  a-*  -°°, 

Lemma  1 .  Let  be  independent,  each  with  d.f.  G,  and  let  0*T(G). 

Suppose  9(x,t)  is  continuous  in  x  for  t e  9 c  Rd  and 


1  im(|ip( - ,t)  -  '!'(*  »9)  llv  =  0. 
t+9 

If  Tn -*■  9  in  probability  as  n-*°°,  then  (3.1)  holds. 

Remark.  The  score  functions  of  Examoles  1  and  2  are  continuous  in  total  vari¬ 
ation.  For  the  former  see  Boos  and  Serfling  (1980).  For  the  latter,  see 
Simpson  (1985). 
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When  the  underlying  distribution  is  discrete,  the  set  of  points  where  <p 
fails  to  have  a  derivative  has  positive  probability  for  certain  parameter 
values.  In  light  of  (3.1),  it  is  natural  to  ask  whether  M  can  have  a  deri¬ 
vative  at  such  parameter  values,  i.e.,  whether  Tn  can  be  asymptotically  nor¬ 
mal  . 

The  following  theorem  addresses  this  question.  For  9eQcRd,  FQ  is  as- 
sumed  to  have  a  density  f  =f(*,6)  with  respect  to  a  o-finite  measure  u,  and 

a 

=  ip(-  ,6)  is  measurable  for  each  9.  Let  ||*||  denote  any  norm  on  Ra  equivalent 
to  the  Euclidean  norm.  Some  regularity  conditions  are  needed: 

(Al)  There  are  measurable  functions  =  cjj( • , t )  and  gt  =  g(*,t)  for  which 
/tOj.fj.dy,  /kt|gtdy  and  /w^dy  are  finite  and,  for  some  6>0, 

(i)  |f$  -  ft|  s  || s  -  t||gt,  and 

(ii)  k$|  s  (Dt 

almost  everywhere  [y]  (a.e.)  when  ||s-t||  s  6; 

(A2)  There  is  a  measurable  R^-valued  function  ft  =  ^(*,t)  such  that 

lfs"ft"  =  °( !l s  -^11)  a-e-; 

(A3)  -*•  a.e.  as  s-*t. 

Theorem  1 .  If  for  each  te0  ( Al ) - ( A3 )  hold  and 

(3.2)  M(t;Ft)  =  0 
then 

(3.3)  Ds  ri  (s;Ft)  | $=t  =  -J^tdy, 

where  Dg  denotes  vector  differentiation,  and  where  the  dependence  of  M  on  ip 
has  been  suppressed. 


Proof  For  s,t  e  0 


0  »  M(s;Ft)  -  M(t;Ft)  +  M(s;F$)  -  M(s:Ft) 

=  M(s;Ft)  -  M(t;Ft) 

(3.4)  +  M(t;Fs)  -  M(t;Ft)  +  Rt(s), 

where  Rt(s)  =  J(ys  -  ij>t) (f$  -  ft)dy 

and  (3.2)  was  used.  The  integrand  of  Rt(s)  is  dominated  in  absolute  value 
by  2 1| s  -  t||uj^gt  on  ||s-t||<6  because  of  (Al).  Hence,  by  (A3)  and  Dominated 
Convergence, 

(3.5)  Rt(s)  =  o(  ||  s  -  1 1| ) . 

Similarly,  (A2)  and  Dominated  Convergence  imply 

|M(t;Fs)  -  M(t;Ft)  -  (s  -  t)1/^^ | 

-  JKI  lfs"ft"  ^ s  “  t)Tft ldu 

=  o(  ||  s  -  1 1| )  as  s  -*■  t, 

since  the  integrand  is  dominated  by  2||s-t||  |^|g  on  ||s-t||<5.  From 
(3.4)  to  (3.6)  conclude 

|M(s;Ft)  -  M(t;Ft)  +  (s  -  t)T/^ftdp  |  =  o(||s-t||). 

Hence  D$  M (s ;F^)  exists  at  t  and  is  given  by  (3.3). 

Remarks.  1.  Note  that  ^  need  not  be  differentiable. 

2.  When  (3.3)  generalizes  the  usual  information  identity. 
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3.  Huber  (1981,  p.  51)  observes  a  special  case,  namely  (3.3)  holds  when  y 

is  Lebesgue  measure,  -j;(x,t)  =  ip(x  -  t) ,  where  ip(*)  is  skew-symmetric  about  zero, 
and  f (x,t)  =  f(x  - 1) ,  where  f(*)  is  differentiable  and  symmetric  about  zero. 

4.  Equation  (3.3),  when  it  holds,  also  guarantees  that  the  influence  function 
at  the  model,  given  by 

{DsM(s;Ft)|s=t}'^(x,t) 

is  defined  for  each  te0,  provided  that  Jip^dy  ^  0. 

Example  2  (continued)  Suooose  f(x,t)  =  e_ttx/x!  on  {0,1,2,...},  t>0.  Recall 

i/ 

that  the  optimal  M-estimator  has  the  score  tp(x,t)  =  ^  (xt  2  -B).  This  esti¬ 
mator  is  known  to  be  asymptotically  normal  at  the  Poisson  distribution  when  t 
is  in  one  of  the  open  intervals  where  neither  of  the  truncation  Doints 

i/ 

t/2(e  +  c)  is  an  integer;  see  Hampel  (1968,  p.  97). 

To  show  that  it  is  asymptotically  normal  at  every  Poisson  distribution, 
as  conjectured  by  Hampel,  first  use  Theorem  1  with 
g(x,t)  =  e^f  (x  -  1  ,t  +  6)  +  6  ^(e<5-l-6)f(x,t),  w(x,t)  =  c  and 
f(x,t)  =  f(x-l,t)  -  f(x,t).  Note  that  c>l  is  sufficient  for  3  to  be  con¬ 
tinuous,  and  hence  for  (A3);  see  Simpson  (1985). 

2  2 

Since  Lemma  1  applies  and  0<  /i^f^dy  <  c  for  c>l,  it  follows  that  the 
estimator  is  asymptotically  normal  at  every  Poisson  distribution  if  it  is 
consistent.  For  consistency  see  Hampel  (1968,  p.  96)  and  Theorem  2  of  Huber 
(1967). 

In  Theorem  1,  (3.2)  allows  smoothness  of  the  parametrization  to  be  sub¬ 
stituted  for  smoothness  of  41  within  the  assumed  parametric  model,  so  that 
the  estimator  is  asymptotically  normal  under  further  conditions.  If  the 


specification  of  the  model  is  inexact  (as  is  often  suspected),  no  result  like 
(3.3)  is  available.  In  certain  cases,  it  is  still  possible  to  obtain  the 
limiting  distribution  of  Tn  from  (3.1). 

Assume  for  simplicity  that  0  is  an  open  subset  of  the  real  line.  The 
score  functionsused  for  robust  estimation  are  generally  at  least  niece-wise 
differentiable.  The  one-sided  derivatives  of  M(t;G)  will  then  exist,  in 
general,  even  when  M  fails  to  be  differentiable.  Write 

m(t;G)  =  -j^-M  (t;G) 

when  the  derivative  exists.  By  a  well-known  result  from  calculus,  if 
m(0-;G)  and  m(e+;G)  exist,  they  are  equal  to  the  corresponding  one-sided 
derivatives  of  M(t;G)  at  0;  see,  e.g.,  Franklin  (1940,  p.  118). 

Theorem  2.  Suppose  for  some  0  interior  to  0  that  M(0;G)  =  O,  and  let  T^  be  a 
zero  of  M(t;Fn),  n=l,2,...,  where  Fn  is  the  empirical  d.f.  Assume  the  fol¬ 
lowing: 

(Bl)  M(0-;G)  and  m(0+;G)  exist  finitely  and  are  non-zero  and  of 
the  same  sign; 

2  2 

(B2)  0<o<<»,  where  a  =  J^gdG; 

(B3)  in  probability  as  n-*-°°,  and  (3.1)  holds. 


Then 

(3.7) 

lim  sup  |pr{n  ^(T  -  0)  <  z}  -  H(z) 

p-^OO  _00<Z<<»  ’ 

where 

rd>(  |m(0+;G)|z/a),  z>0 

H(z)  =  ] 

^(|m(0-;G)|z/a),  z<0. 

and  <j>  is  the  standard  normal  d.f. 
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Remarks.  1.  Huber  (1964,  p.  78)  alludes  to  a  similar  result  for  a  location 
estimator. 

2.  The  requirement  that  m(0+ ;G)  have  the  same  sign  is  actually  implied  by 
the  remaining  conditions.  If  the  one-sided  derivatives  were  to  have  opposite 
signs,  M(t;G)  would  not  change  signs  in  a  neighborhood  of  0  and  (3.1)  would 
not  hold. 

The  proof  of  Theorem  2  is  deferred  to  the  Appendix. 

Example  1  (continued)  Recall  that  the  Huber  M-estimator  for  location  has  the 
score  4>(x,t)  =  ^(x-t).  For  any  d.f.  G,  M(-°°;G)  =  c  =  -M(»;G),  and  M(t;G)  is 
continuous  in  t  so  it  has  a  zero  0.  Assume  0  =  0.  This  is  unique  if 
G(c-)  >  G(-c+) ,  in  which  case  T  -*-0  in  probability  by  Proposition  2.2.1  of 
Huber  (1981).  Since  ip  is  continuous  in  total  variation,  (3.1)  holds  by 
Lemma  1.  Letting  ip(x ,t)  =  d/dt  ip  (x  -  t)  =  -  ip '  ( x  -  t )  if  it  exists,  observe 

w  v 

that  -ii>(x,t-)  =  I(-c<x-t<c)  and  -iji(x,t+)  =  I(-c<x-tsc),  where  I(*)  de¬ 
notes  the  indicator  function.  Bounded  convergence  yields  -m(0-;G)  =  G(c-)  -G(-c-) 

i/ 

and  -m(0+;G)  =  G(c+)  -  G(-c+).  Hence,  by  Theorem  2,  n  2Tn  is  asymptotically 
normal  if  G(c+)  -  G(c-)  =  G(-c+)  -  G(-c-);  otherwise,  it  has  a  limiting  dis¬ 
tribution  consisting  of  the  left  and  right  halves  of  two  normal  distributions 
with  different  variances  (cf.  Huber  (1981,  p.  51)). 

4.  A  counterexample 

It  is  instructive  to  examine  the  extent  of  the  non-normality  that  occurs 
in  a  specific  example.  Consider  again  the  optimal  M-estimator  for  the  Poisson 
parameter.  The  score  function  is 


•  -C,  X  s  l(  t) 

ip(x ,t)  =  *  (xf  ^  -  6)  =  ■  xt"  ^  -  B»  £(t)  <  x  <  h(t) 

c,  h(t) s  X, 


V  '•*  *.*  .*  v  .•  v. 


E 


f 


where  £(t)  =  t  /2(8(t)  -  c)  and  h(t)  =  t  /2(8(t)  +  c). 


Let  G  be  the  actual  d.f.  and  let  0  =  T(G).  The  simplest  situation  is  when 
0  is  small.  Assume  henceforth  that  £{0)<O<h(0)  =  1.  Calculation  yields 
0 ( t )  =  etc*  -  1)  for  £(t) <  0,  0  <  h(t)  <  1 ,  and  6(t)  =  c{et(l  +  t)"1  -  1 }  +  t  ^  (1  + 1) 
for  £(t)<0,  1  <  h(t)  <  2.  Since  8  is  continuous,  equating  the  two  expressions 
at  0  gives 

(4.1)  01/2e9  =  c'1. 

The  one-sided  derivatives  of  8  at  0  are  8’(0-)  =  ce9  and  6'(0+)  =  ^ce9(l+0)'^ 
where  (4.1)  was  used.  Note  that  8  is  strictly  increasing  at  0.  Since 
Vg(c-)  =  1  and  i^(c+)  =  0, 

.  r ce9,  x  =  0 

(4.2)  -ip(x,0-)  =  | 

''0,  x  =  1 ,2, . . . 


and 


-ip  ( x  ,0+) 


^■ce9  (1+0)  ^ ,  x  =  0 

Ice9  {0"1  +  (1  +  0)"1  > ,  x=  1 
0,  x  =  2,3,... 


Suppose  G  is  a  mixture  of  a  Poisson  distribution  and  a  point  mass  at  an 
integer  z,  i.e. ,  G*  (1  -  e)Ft +  e6  .  Assume  z  >  h(t)  so^(z,0+)  =  O.  From  (4.2) 
and  (4.3) 


(4.4) 


m(0+;G)  _  1/t  l+t\ 
m(9-;G)  "  l+0;’ 


9-t 

where  m(9-;G)  =  -ce  (1  -e).  The  ratio  (4.4)  is  unity  only  when  t  =  9,  which 
corresponds  to  e  =  0  or  z*t.  By  Theorem  2,  the  limiting  distribution  of 

i/ 

n  2(T  -8)  consists  of  the  right  and  left  halves  of  two  normal  distributions. 


2 


The  ratio  of  their  standard  deviations  is  (4.4). 

Solving  0  =  M(9;G)  =  c{ 1  -  (1  -  e)e0_t}  yields  t  =  9  +  log(1  -  e) .  Table  1 

shows  the  values  of  t  and  (4.4)  for  several  values  of  e  when  9  =  0.25  and 
-  V  -0 

c  =  0  /2e  =  1.5576  ...  (see  (4.1)).  In  addition,  the  effect  on  a  nominal  .05 

tail  probability  is  shown. 

For  very  small  values  of  e  the  effect  is  minimal,  which  accords  with  the 
robustness  of  Tp  in  the  sense  of  weak*  continuity  (see  Hampel  (1971)),  since 
it  is  asymptotically  normal  at  the  model.  As  e  increases,  however,  the  ef¬ 
fect  becomes  more  serious,  and  inference  based  on  Tn  can  be  substantially 
biased. 

For  related  work  see  Stigler  (1973),  who  observes  that  a  bias  of  this 
type  can  arise  when  the  trimmed  mean  is  used  for  discrete  or  grouped  data. 


Table  1  Effect  of  contaminating  mass  e  with  9  =  0.25  fixed 


e _ t _ r  =  (4.4) _ <£>(-1 .645r) 


0.25 

1 

.05 

0.01 

0.24 

0.976 

.054 

0.199 

0.877 

.074 

0.10 

0.145 

0.748 

.109 

0.15 

0.610 

.158 

0.465 

.222 

5.  Smooth  score  functions 


In  the  example  of  the  preceding  section,  one  might  argue  that  the  para¬ 
meter  values  where  problems  arise  are  unlikely  to  occur  in  practice,  or  that 
c  can  be  changed  slightly.  It  is  not,  however,  the  non-normal  limiting  dis¬ 
tribution  of  Tn  at  certain  distributions  that  is  of  concern,  but  the  insta¬ 
bility  of  inference  based  on  Tn  near  those  distributions.  This  phenomenon 


can  alternatively  be  interpreted  as  a  discontinuity  of  the  asymptotic  variance 

functional  V(T(G);G)  =  (m(T(6) ;G)}-1  /4(G)dG  ^(T(G)  ;G)}_1 ;  cf.  Huber  (1981,  p. 

51).  In  the  neighborhood  of  a  distribution  where  V  is  discontinuous,  estimates 

of  the  variance  of  T  may  be  unstable. 

n 

Instability  of  tyis  type  can  be  avoided  by  requiring  the  M-estimator  score 
function  to  be  smooth,  for  example,  by  replacing  in  (2.4)  with  a  smooth 

approximation.  A  natural  way  to  construct  such  a  function  is  by  rescaling  a 
smooth  distribution  function. 

Suppose  F  is  an  absolutely  continuous  d.f.  with  density  f  symmetric  about 
zero.  Then 

(5.1)  ip(x)  =  2c(F(2H^o)-)  - 

is  monotone  increasing,  skew- symmetric  about  zero,  and  satisfies  ip(°°)  =  c  and 
’/(0)  =  1.  Observe  that  ip  is  obtained  from  (5.1)  by  taking  F  to  be  the  uni- 
form  distribution  on  [ - V2 , Vz ] -  This  can  be  approximated  arbitrarily  closely 
by  a  symmetric  beta  distribution  with  a  small  value  for  the  shape  parameter, 
i.e.,  f (x)  <*  {(V2  +  x)(V2  -  x)}a  on  [ - V2 > V2 ] -  The  resulting  score  function  is 
complicated,  however,  and  its  second  derivative  has  jump  discontinuities.  A 
more  convenient  choice  is  the  logistic  distribution,  which  leads  to  the  smooth 
function 

Lc(x)  =  c  tanh  (x/c) . 

This  has  appeared  previously.  Lc ( x  -  t)  is  the  maximum  likelihood  score  for 
the  location  of  a  logistic  distribution  with  scale  1.  Holland  and  Welsch 
(1977)  include  an  M-estimator  using  l_c  in  a  Monte  Carlo  study  of  robust  re¬ 
gression  estimates. 

For  the  important  special  case  of  estimating  a  Poisson  parameter  robustly, 
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a  smooth  version  of  the  optimal  M-estimator  solves 

(5.2)  n"1  l  Lr(X  t"l/2-8(t))  =  0, 

i=1  c  1 

where  8  is  defined  in  the  usual  manner. 

Table  2  gives  asymptotic  variances  VQ  and  bounds  yQ  on  influence  functions 
for  the  estimator  defined  by  (5.2),  labeled  Lc»  and  the  optimal  estimator, 
labeled  ipc.  In  each  case  c  =  1.5.  The  calculations  are  at  the  Poisson  model, 

i/ 

and  VQ  and  ya  are  stabilized  by  dividing  by  9  and  9  2  respectively. 

Note  that  VQ/9  is  the  asymptotic  relative  efficiency  of  the  maximum  like- 
lihood  estimator  (sample  mean)  with  respect  to  the  corresponding  M-estimator. 

The  asymptotic  variances  for  the  logistic  score  are  slightly  smaller  than 
those  for  the  "optimal"  score.  This  is  possible  because  the  bounds  on  the 
influence  function  of  L  are  slightly  higher  for  \f>  .  In  terms  of  performance 

v  V# 

at  the  model,  there  appears  to  be  little  difference  between  L  and  <j >  . 

Table  2  Asymptotic  variances  and  influence  function  bounds  at  the  Poisson  model 

Mean  ip  Lc 

9  Vq/9  C  yQ/9l/2  V0/9  yQ/01/2 


s 


6.  Further  remarks 


The  need  for  smooth  score  functions  is  most  clear  when  the  data  consist  of 


counts.  In  this  case  every  deviation  from  the  model  involves  point  masses. 


An  important  consequence  of  Theorem  1  is  that  Hampel's  optimal  estimator 


(2.4)  is  indeed  optimal  as  claimed  when  the  model  distribution  is  discrete. 


It  would  be  disturbing  if  the  theory  were  to  break  down  at  a  countable  number 


of  parameter  values.  Moreover,  the  smooth  versions  discussed  in  Section  5, 


which  provide  more  stable  inference,  are  justified  for  every  parameter  value 


as  being  nearly  optimal. 


Although  the  discussion  has  focused  on  the  score  functions  arising  from 


Hampel's  optimality  theory,  it  is  not  limited  to  that  context.  For  instance. 


a  score  based  on  Hampel's  three  part  redescending  if/  (see  Huber  (1981,  p.  102)) 


will  be  prone  to  the  same  difficulties,  and  a  smooth  version  will  be  more  stable. 


:.  Proof  of  Theorem  2. 


Since  the  d.f.  H  is  continuous,  uniform  convergence  in  (3.7)  will  follow 


from  pointwise  convergence  via  Polya's  Theorem  (Serf ling  (1980,  p.  18)). 


Write  M(t)  for  M(t;G)  and  m(t)  for  m(t;G).  denote  by  U(6)  the  set 


(t:  0<  1 1  -  0 1  <  6) .  By  (Bl),  m  is  defined  on  U(6)  if  5  is  sufficiently  small . 
Moreover,  given  e>0,  there  is  a  6  for  which  teU(<5)  implies 


|m(t)  -  m(9-) j  <  e  if  t  <  6 


|m(t)  -  m(9+) |  <  c  if  t  >  0. 


Choosing  e <  min{ |m(9-) | ,  |m(6+)|}  then  guarantees  that  |m(t)|  is  bounded 
away  from  zero  on  U(6).  Fix  such  a  6. 


7  *  *  4  •  .  '  .  •  .  •  .  •  .  •  4*  -  -  V • 

js  ,•  y  “  s  4»  ‘  -  ^  “  .  * 


1  •/  -.**  oo  *:  %■ 


ss,''.' 
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Since  M(e)  =  0,  teU(6)  implies 
(A.l )  M(t)  =  m(x)(t  -  9) 

for  some  x  strictly  between  t  and  9,  by  the  Mean  Value  Theorem  (which  only 
requires  one-sided  derivatives  at  the  endpoints  of  the  interval  on  which  it  ] 

is  applied).  Since  m  is  bounded  away  from  zero  on  U(6),  (A.l)  shows 

1 1  -  9 1  =  0( |M(t) | ) 

as  t -► 9 .  The  right  hand  side  of  (A.l)  equals 

(A. 2)  D(t) (t  -  9)  +  R(t) , 

where 

D(t)  =  m(9+)I(t>9)  +  m(9-)I(t<9), 

R(t)  =  [{m(x)  -  m(9+)}I(t>9)  +  (m(x)  -  m(9-)}I(t < 9)](t  -  9) , 

and  1(A)  is  the  indicator  for  the  set  A.  Note  that  (A. 2)  also  holds  if  t  =  9. 

Since  R(t)  =  o ( 1 1  -  © | )  =  o(|M(t)|),  (A.l)  and  (A. 2)  yield 

(A. 3)  D(Tn)nVz(Tn-9)  =  nV2M(Tn)  +  0(  |nV2M(Tn)  | ) . 

Because  of  (B2),  (B3)  and  the  Lindeberg-Levy  central  limit  theorem,  the  right 
hand  side  of  (A. 3)  converges  in  distribution  to  a  N(0,a  )  random  variable, 
and,  hence,  so  does  the  left  hand  side. 

To  obtain  the  limiting  distribution  of  Tn,  partition  its  range  and  consider 
cases.  If  z  <  0  then 

pr{n  /z(T n  -  9)  s  z,  Tn  >  9}  =  0, 


pr{n  /2(Tn  -  9)  <z,  Tn  <  0}  =  pr{  |D(Tn)  |n  /2(Tn  -  9)  <  |D(Tn)|z}. 


Since  D (Tn )  =  m(9-)  when  Tn<9,  and  D(t)  does  not  change  sign  on  (9 -5,9 +  6) 
by  (Bl),  (A. 3)  implies  that  this  last  probability  converges  to  $( |m(9-) j z/o) 
as  n-»°°.  Similar  arguments  establish  that,  for  z>0. 


pr{n  ^2(Tn  -  9)  <  z,  Tn  <  9}  =  pr{  |m(0-)  |n  ^2(Tn  -  9)  <  0}  ^ 


and 

pr{n  /2(Tn  -  9)  <  z,  Tn  >  9} 

=  pr{0  <  |m(9+)  |n /2(Tn  -  9)  <  z|m(9+)  | }  -*•  0(  |m(e+) ) z/a)  -  ^ 

and  finally 

pr{n1/z(Tn  -  9)  <  0}  =  1  -  pr{|m(6+)|n/2(Tn-0)  >  0}  -+ j 


as  n  q°. 


The  result  follows  by  collecting  terms. 
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